Outlier Detection for Robust Multi-dimensional Scaling

نویسندگان

  • Leonid Blouvshtein
  • Daniel Cohen-Or
چکیده

Multi-dimensional scaling (MDS) plays a central role in data-exploration, dimensionality reduction and visualization. State-of-the-art MDS algorithms are not robust to outliers, yielding significant errors in the embedding even when only a handful of outliers are present. In this paper, we introduce a technique to detect and filter outliers based on geometric reasoning. We test the validity of triangles formed by three points, and mark a triangle as broken if its triangle inequality does not hold. The premise of our work is that unlike inliers, outlier distances tend to break many triangles. Our method is tested and its performance is evaluated on various datasets and distributions of outliers. We demonstrate that for a reasonable amount of outliers, e.g., under 20%, our method is effective, and leads to a high embedding quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous robust estimation of multi-response surfaces in the presence of outliers

A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...

متن کامل

Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator

The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

MCODE: Multivariate Conditional Outlier Detection

Outlier detection aims to identify unusual data instances that deviate from expected patterns. The outlier detection is particularly challenging when outliers are context dependent and when they are defined by unusual combinations of multiple outcome variable values. In this paper, we develop and study a new conditional outlier detection approach for multivariate outcome spaces that works by (1...

متن کامل

RobCoP: A Matlab Package for Robust CoPlot Analysis

The graphical representation method, Robust CoPlot, is a robust variant of the classical CoPlot method. CoPlot is an adaptation of multidimensional scaling (MDS), and is a practical tool for visual inspection and rich interpretation of multivariate data. CoPlot enables presentation of a multidimensional dataset in a two dimensions, in a manner that relations between both variables and observati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.02341  شماره 

صفحات  -

تاریخ انتشار 2018